feat(client): add Dynamo inference backend#2773
Open
biswapanda wants to merge 8 commits into
Open
Conversation
1 task
…mo admin to worker system URL
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a31c60b. Configure here.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Overview:
Adds NVIDIA Dynamo as an optional inference backend alongside the existing vLLM path. Controlled by a new
ClientConfig.backendfield ("vllm"|"dynamo"). Three self-contained changes: a pluggableAdminAPIabstraction,renderer_transportselection for the verifiers wire shape, and a Dynamo teacher-logprobs path for OPD training.Details:
packages/prime-rl-configs/src/prime_rl/configs/shared.pyClientConfig.backend: Literal["vllm", "dynamo"]— selects theAdminAPIimplementation and verifiers wire shape. Default"vllm"is a no-op for existing configs.ClientConfig.rl_base_url— optional override for the Dynamo RL worker discovery listener (GET /v1/rl/workers). When unset, the port is derived fromDYN_RL_PORT(default 8001).src/prime_rl/utils/client.pyAdminAPIProtocol +VLLMAdminAPI— extracts the existing vLLM admin paths (/pause,/resume,/update_weights,/load_lora_adapter,/init_broadcaster) into a typed protocol.VLLMAdminAPImethods go through a shared_admin_posthelper that adds bounded per-attempt timeouts and tenacity retry on 5xx/transport errors (300 s for pause/resume, 720 s for weight updates).DynamoAdminAPI— Dynamo worker admin overPOST /engine/<method>:pause_generation,resume_generation,update_weights_from_disk/update_weights_from_distributed(filesystem vs NCCL paths),load_lora_adapter. Inherits health/model checks fromVLLMAdminAPI.setup_admin_api(client_config)— picksDynamoAdminAPIwhenbackend="dynamo",VLLMAdminAPIotherwise.discover_dynamo_admin_base_urls— resolves worker system URLs fromGET /v1/rl/workers; falls back to port-replacedbase_urlwhenrl_base_urlis unset.setup_clients— setsrenderer_transport="dynamo_chat"on allvf.ClientConfigobjects whenbackend="dynamo","vllm_generate"otherwise. Requires verifiers #1574 + renderers #79.src/prime_rl/orchestrator/utils.pycompute_teacher_logprobsinto two paths dispatched onclient_config.renderer_transport:_compute_teacher_logprobs_vllm(existing/inference/v1/generatepath) and_compute_teacher_logprobs_dynamo(POST/v1/chat/completionswithnvext.token_data+nvext.extra_fields=["prompt_logprobs"])._flatten_prompt_logprobs— shared flattener that handles both vLLM typedLogprobobjects and Dynamos dict shape{logprob, rank?, decoded_token?}.Where should the reviewer start?
src/prime_rl/utils/client.py—AdminAPIprotocol (line ~32),DynamoAdminAPIclass,setup_admin_api, andsetup_clientsrenderer_transport selection. Core of the change.src/prime_rl/orchestrator/utils.py—_compute_teacher_logprobs_dynamoand thecompute_teacher_logprobsdispatcher. Note the placeholdermessagesfield required by the Dynamo frontend even whennvext.token_datais set.packages/prime-rl-configs/src/prime_rl/configs/shared.py— the two newClientConfigfields; verify defaults are backward-compatible.Related Issues:
renderer_transportfield tovf.ClientConfigdynamo_chattransport torenderers.generate()Note
Medium Risk
Changes the weight-update and NCCL initialization paths when
backend=dynamo, but defaultvllmbehavior is preserved; misconfigured discovery or engine RPC could break training on Dynamo deployments.Overview
Adds NVIDIA Dynamo as an optional inference backend via
ClientConfig.backend("vllm"|"dynamo", default unchanged) and optionalrl_base_urlfor RL worker discovery.Admin layer: Inference admin is refactored behind an
AdminAPIprotocol withVLLMAdminAPI(existing/pause,/update_weights, etc.) andDynamoAdminAPI(POST /engine/*, filesystem vs NCCL weight updates, LoRA viaload_lora). Health, model checks, weight updates, LoRA load, and NCCL init all route through the selected implementation.Dynamo wiring: When
admin_base_urlis unset, worker system URLs are discovered fromGET /v1/rl/workers(port fromrl_base_urlorDYN_RL_PORT). Static pools retry discovery inwait_for_ready; elastic pools pin each pod’s admin client to the matchingsystem_urlby IP/DNS.Rollouts & OPD:
setup_clientssetsrenderer_transportto"dynamo"for the nvext wire shape.compute_teacher_logprobsdispatches to vLLM/inference/v1/generateor Dynamo chat completions withnvext.token_data. The orchestrator passesweight_broadcast.typeintoDynamoAdminAPIfor NCCL vs disk updates.Elastic: Separate model HTTP clients on the OpenAI-compat URL while admin hits the system server; backend is preserved when rebuilding train clients.
Reviewed by Cursor Bugbot for commit 3c41ee3. Bugbot is set up for automated code reviews on this repo. Configure here.